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Abstract 

We consider estimation of the common probability density / of i.i.d. ran- 
dom variables Xj that are observed with an additive i.i.d. noise. We assume 
that the unknown density / belongs to a class A of densities whose charac- 
teristic function is described by the exponent exp(— a|u| r ) as \u\ — ► oo, where 
a > 0, r > 0. The noise density is supposed to be known and such that 
its characteristic function decays as exp(—f3\u\ s ), as \u\ — ► oo, where j3 > 0, 
s > 0. Assuming that r < s, we suggest a kernel type estimator that is op- 
timal in sharp asymptotical minimax sense on A simultaneously under the 
pointwise and the L2-risks. The variance of this estimator turns out to be 
asymptotically negligible w.r.t. its squared bias. For r < s/2 we construct a 
sharp adaptive estimator of /. We discuss some effects of dominating bias, 
such as superemciency of minimax estimators. 

Mathematics Subject Classifications: 62G05, 62G20 

Key Words: Deconvolution, nonparametric density estimation, infinitely differen- 
tiable functions, exact constants in nonparametric smoothing, minimax risk, adap- 
tive curve estimation. 
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1 Introduction 



Assume that one observes Y±, . . . ,Y n in the model 

Y i = X i +e u i = l,...,n, 

where Xi are i.i.d. random variables with an unknown probability density / w.r.t. 
the Lebesgue measure on R, the random variables e, are i.i.d. with known probability 
density f e w.r.t. the Lebesgue mesure on R, and (ei,...,e n ) is independent of 
(Xx, ■ ■ ■ ,X n ). The deconvolution problem that we consider here is to estimate / 
from observations Yj., . . . , Y n . 

Denote by f Y = f * f £ the density of the variables Y iy where * is the convolution 
sign. Let $ y , $ x and $ e be the characteristic functions of random variables Y, 
Xi and £j, respectively. For an integrable function g : R — > R, define the Fourier 
transform 

= J g(x) exp(ixu)du. 
We assume that the unknown density / belongs to the class of functions 

A a ,r(L) = {/ is a probability density on R and J \&(u) \ 2 exp(2a\u\ r )du < 2ttL}, 

where a>0, r>0, L > are finite constants. The classes of densities of this type 
have been studied by many authors starting from Ibragimov and Hasminskii (1983). 
For a recent overview see Belitser and Levit (2001) and Artiles (2001). 

We suppose also in most of the results that the characteristic function of noise 
£j satisfies the following assumption. 

Assumption (N). There exist constants Uq>0,(3>0,s>0, b m { n > 0, 
bm&x > and 7,7'Gl such that 

b min \u\ J exp(-f3 \u\ s ) < |$ £ (m)| < & max M 7 'exp(-/3 \u\ s ) (1) 

for \u\ > Uq. 

Many important probability densities belong to the class A a , r (L) with some 
a,r,L or have the characteristic function satisfying ([TJ. All such densities are in- 
finitely many times differentiable on R. Examples include normal, Cauchy and 
general stable laws, Student, logistic, extreme value distributions and other, as well 
as their mixtures and convolutions. Note that in these examples the values r and/or 
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s are less or equal to 2. Although the densities with r > 2, s > 2 are in principle 
conceivable, they are difficult to express in a closed form, and the set of such den- 
sities does not contain statistically famous representatives. This remark concerns 
especially the noise density f e that should be explicitly known. Therefore, without 
a meaningful loss, we will sometimes restrict our study to the case < s < 2. 

For any estimator /„ of / define the maximal pointwise risk over the class A a , r (L) 
for any fixed x € M by 

^ T ^ - 

R n {x, f n , A a ,r{ L )) = SU P E f 
feA a , r {L) 

and the maximal L2-risk 



fn(x) - f(x) 



R n {h 2 ,f n ,A aj r{L))= sup E f \\f n -f\\ 2 2 

feAcA L ) 

where Ef(-) is the expectation with respect to the joint distribution Pf of Y\, . . . , Y n , 
when the underlying probability density of X^s is /, and || • H2 stands for the L 2 (1R)- 
norm. (In what follows we use the notation L P (R), in general, for the L p -spaces of 
complex valued functions on JR..) 

The asymptotics of optimal estimators differ significantly for the cases r < s, 
r = s and r > s. If r < s the variance of the optimal estimator is asymptotically 
negligible w.r.t. the bias, while for r > s the bias is asymptotically negligible w.r.t. 
the variance. In this paper we consider the bias dominated case, i.e. we assume that 
r < s. The setting with dominating variance will be treated in another paper. 

The problems of density deconvolution with dominating bias were historically the 
first ones studied in the literature [cf. Ritov (1987), Stefanski and Carroll (1990), 
Carroll and Hall (1988), Zhang (1990), Fan (1991a,b), Masry (1991), Efromovich 
(1997)], motivated by the importance of deconvolution with gaussian noise. These 
papers consider, in particular, the noise distributions satisfying ((TJ), but the densities 
/ belonging to finite smoothness classes, such as Holder or Sobolev ones, where the 
estimation of / is harder than for the class A a ,r{L) . In this framework they show that 
optimal rates of convergence are as a power of log n which suggests that essentially 
there is no hope to recover / with a reasonably small error for reasonable sample 
sizes. This conclusion is often interpreted as a general pessimistic message about 
the gaussian deconvolution problem. Note, however, that such minimax results are 
obtained for the least favorable densities in Holder or Sobolev classes. Often the 
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underlying density is much nicer (for instance, it belongs to A a>r (L), as the popular 
densities mentioned above), and the estimation can be significantly improved, as we 
show below: the optimal rates of convergence are in fact faster than any power of 
logn. 

Pensky and Vidakovic (1999) were the first to point out the effect of fast rates 
in density deconvolution, considering the classes of densities that are somewhat 
smaller than A a , r {L) (including an additional restriction on the tails of /) and 
with the noise satisfying (JT|). They analyzed the rates of convergence of wavelet 
deconvolution estimators, restricting their attention to the L 2 -risk. Our results 
imply that the rates achieved by their estimators are suboptimal on A a , r {L) and 
that the optimal rates can be attained by a simpler and more traditional kernel 
deconvolution method with suitably chosen parameters. We will show that our 
method attains not only the optimal rates but also the best asymptotic constants 
(i.e. is sharp optimal). Moreover, we will prove that the proposed estimator is sharp 
optimal simultaneously under the L2-risk and under the pointwise risk and that it 
is sharp adaptive to the parameters a,r,L in some cases. 

The most difficult part of our results is the construction of minimax lower bounds. 
The technique that we develop might be useful to get lower bounds for similar "2 
exponents" type settings in other inverse problems. To our knowledge, except for the 
case r — s — 1 treated by Golubev and Khasminskii (2001), Tsybakov (2000) and 
Cavalier, Golubev, Lepski and Tsybakov (2003), such lower bounds are not available 
even for the Gaussian white noise (or sequence space) deconvolution model, although 
some upper bounds are known (cf. Ermakov (1989), Efromovich and Koltchinskii 
(2001)). 

Finally, we mention publications on adaptive deconvolution under Assumption 
(N) or its analogs. They deal with the problems that are somewhat different from 
ours. Efromovich (1997) considered the problem of deconvolution where the den- 
sities / and f e are both periodic on [0, 2tt], f e satisfies an analog of Assumption 
(N) expressed in terms of Fourier coefficients and / belongs to a class of periodic 
functions of Sobolev type. He proposed sharp adaptive estimators with logarithmic 
rates which are optimal for that framework, as discussed above. Adaptive deconvo- 
lution in a gaussian white noise model had been studied by Goldenshluger (1998). 
He worked under the Assumption (N) on the Fourier transform of the convolution 
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kernel or under the assumption that it decreases as a power of u, as \u\ — > oo, but he 
assumed that the function / to estimate belongs to a Sobolev class with unknown 
parameters. He proposed a rate adaptive estimator under the pointwise risk. 



The estimator, its bias and variance 



Consider the following kernel estimator of /: 



1 A „ fx-Yi 



i=l x ' 

where h n > is a bandwidth and K n is the function on R defined as the inverse 
Fourier transform of 

Here and later /(•) denotes the indicator function. The function K n is called kernel, 
but unlike the usual Parzen-Rosenblatt kernels, it depends on n. 

For the existence of K n it is enough that $ Kn G L 2 (R) (and thus $ Kn G Li(R)). 
This holds under mild assumptions. For example, in view of the continuity property 
of characterictic functions, the assumption that <& £ {u) 7^ for all u G R is sufficient 
to have $ Kn G L 2 (R). Moreover, the condition $ Kn G L 2 (R) implies that the kernel 
K n is real- valued. In fact, under this condition we have $ "•(«) = $ e (— u/h n )V n (u) 
for almost all u G R, where V n (u) = I(\u\ < l)/|$ £ (M//i n )| 2 is an even real-valued 
function belonging to Li(R) and <& £ (—u/h n ) (the complex conjugate of <& £ {u/h n )) 
is the Fourier transform of real-valued function t 1— > h n f £ (—h n t). This implies that 

is a convolution of two real-valued functions. 

The estimator (J2J) belongs to the family of kernel deconvolution estimators stud- 
ied in many papers starting from Stefanski and Carroll (1990), Carroll and Hall 
(1988) and Zhang (1990). It can be also deduced from a unified approach to con- 
struction of estimators in statistical inverse problems (Ruymgaart (1993)). 

The following proposition establishes upper bounds on the pointwise and the L 2 
bias terms, i.e. on the quantities \Eff n (x) — f(x)\ 2 and \\Eff n — /|||. 

Proposition 1 Let f G A a ^ r {L), a > 0, r > 0, L > and assume that & Kn G L 
for any h n > 0. Then the squared bias of f n (x) is bounded as follows 



sup 



2 t. f 2rv\ 

E f fn(x) - f(x) < exp (--J (1 + (1)), 
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as h n — > ; tu/wie i/ie 6ms term of the lL 2 -risk satisfies 

\\E f f n -f\\ 2 2 <Lexp 

for every h n > 0. 

Proof. For the pointwise bias we have 



2a 
hi 



E f f n (x) - f(x) 



h, 



-K„ 



h. 



f(-)Ux)-f(x) 



— / \$ Kn (uh n )$ Y (u) - exp(-iux)du 
2vr J 



< 



(27T) 



7(|«/i„| > l)\$ x (u)\du ) . 



Applying the Cauchy-Schwarz inequality and the assumption that / belongs to 
A a ,r(L) we get 

2 

Effn(x) - f{x) 



< 



< 



(27 

L 

2t7 



^ / exp(-2a|u| r )du / |$ x (V)| 2 exp(2aH r )dw (4) 

i|u|>l/h„ J\u\>l/hn 



'\u\>l/h„ 

exp(— 2a\u\ r )du 



'\u\>l/h n 

which together with Lemma El yields the first inequality of the Proposition. To prove 
the second inequality, we apply the Plancherel formula and get 

2 



\Effn-f\\ 



T-E f K n 



— / \$ Kn (uh n )<S> Y (u)-$ x (u)\ 2 du 
2vr J 

I(\uh n \ > l)\$ x (u)\ 2 du 



< 



1 

2t7 

exp(— 2a /h 



271 



$ A («) M exp(2a \u\ r )du. 



\u\>i/k 



(5) 



□ 



The next proposition gives upper bounds on the pointwise and the L 2 variance 
terms defined as 

Var f f n (x) = E f \f n (x) - E f f n (x)\ 2 and Var ft2 f n = E f \\f n - E f f n \\l 



respectively. 
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Proposition 2 Let the left inequality in (QJ) hold and $ £ (n) ^0,V«6R. Then, 
for any density f such that sup.j. gR f(x) < f* < oo, the pointwise variance of the 
estimator f n {x) is bounded as follows 



supVar f f n (x) = supE f \f n (x) - E f f n (x)\' 



as h n — > 0, and, for an arbitrary density f , the variance terra of the ~L 2 -risk satisfies 



as h r 



Var f)2 f n = E f 
0. 



Wfn - Eff, 



2 

rt|| 2 



Us+2-y-l /o/5 N 

^2^-Gf 1(i+o(i)) « 7 » 



Proof. For the pointwise variance we obtain two separate bounds and then take 
the minimum of them. To get the first bound, we write 



Var f f n (x) 



-Ef 



n 



h \ h 



E< 



J_ R fx — Y\ 

u > 



< 



< 



nh n J h 

f* II TS II 2 



1 Kl { X —V\ f Y (y)dy 



nh r 



\K 



where we used the fact that the convolution density f Y = f * f £ is uniformly bounded 
by Applying the Plancherel formula and using ((TJ and (J64|) of Lemma El in the 
Appendix we get 



\K„ 



n 112 



2tt 



\$ £ (u)\- 2 du 



\u\<\/h n 



< 



< 



h. 



[ \u\- 2 ^ exp(2(3\u\ s )du + ^ f \§ £ (u)\- 2 du 

Ju <\u\<l/h n 27T J\ U \< UQ 



^^min Ju <\u\<l/h 

[ ' " m" 27 exp(2f3u s )du + 0(h n ) 

u2 ^ ^pfirla + Ki)), K^o. 



(9) 



2nb 2 mm (3s *\K 

This and (jHJ) imply the first bound in (jUJ). For the second bound we still use the 
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second line in 



but then we apply the Plancherel formula in a different way: 

2 



Varff n {x) 



< 



< 



n 



h, 



x 



y 



2im 



K 
(uW(u 



F(y)dy 



du < 



27m 



u 



du, 



where K^ n i~) = 
that § Kl - n {u) = 

Varff n ix) 



K n i-/h n )/h n and <& is the complex conjugate of $ . Thus, using 
§ Kn {h n u) and then acting similarly to Q we get 



< 



27m 
f 

27m 



27m 



\§ Kn ih n u) \ du 



W(u)\~ l du 



\u\<l/h n 



< 



2/^+27-2 



7T 



n 



exp 



2J_ 

hi 



which yields the second bound in ©. Finally, 



Var fj2 fn < 



1 



nh n 
1 

nh n 




-Kl 



o(l)), 



^ - y 

hr, 



f (y)dydx 



\K II 2 



□ 



and in view of (JHJ) we obtain ((Zj). 

Clearly, the bounds of Proposition El can be applied to / G A a , r (L) with, for 
example, 

/* = sup sup |/(ac)|. 

feA a , r {L) xeR 

This value is finite and can be taken as in Lemma El of the Appendix. 

Interestingly, inequality © shows that asymptotics of the pointwise variance are 
different for < s < 1 and s > 1, while this is not the case for the L 2 variance term 
given by (J7j). Inequality (jUJ) can be compared to the recent result of van Es and 
Uh (2004). They studied asymptotic pointwise variance of the same deconvolution 
kernel estimator in the particular case of stable noise distributions with 1/3 < s < 2 
and also noticed that s = 1 marks a change of behaviour. These effects concerning 
variance terms will not be crucial in what follows since we will consider the bias 
dominated case. 
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3 Optimal bandwidths and upper bounds for the 
risks 



Propositions ^ and 121 lead to upper bounds for pointwise and L 2 risks that can be 
minimized in h n . In this section we give an asymptotic approximation for the result 
of such a minimization assuming that r < s. The corresponding solutions h n will be 
called optimal bandwidths. Note that here we consider only optimization within a 
given class of estimators, moreover we minimize upper bounds on the risks and not 
the exact risks. However, this turns out to be precise enough in asymptotical sense: 
in the next section we will show that the estimator f n with optimal bandwidth is 
sharp minimax over all possible estimators. 

Decomposition of the mean squared error of the kernel estimator into bias and 
variance terms and application of Propositions ^ and El yields 



Et 



2 

fn(x) - fix 



2 

E ffn(x) - f(x) + Var f f n (x) 

L h r-l ex ( 2a \ j* ^ +27 " 1 c:: ( 2 JL 

~ 2vr«r ™ 6XP V K) 2nPsbl in n 6XP \h' n 

We now minimize the last expression in h n . Clearly, the minimizer h n = h n tends 
to 0, as n — > oo. Taking derivatives with respect to h n and neglecting the smaller 
terms lead us to the equation for optimal bandwidth 

LK ^nh^(l + o(l)) = exp (p. + (10) 



/* " \h r n h 



(asymptotics are taken as h n — > 0, n —>■ oo). Taking logarithms in the above equation 
we obtain that the optimal bandwidth h n is a solution in h of the equation 

2 7 hgh + |? + ^ = logn + C(l + o(l)), (11) 

Here and in what follows we denote by C constants with values in K that can be 
different on different occasions. For the bandwidth h = h n satisfying (fTU)) and (fTTj) 
we can write 

K-^{- ¥ ) - C(l +o( l,)^exp(| 

/,s+27-l /OR 

n \hl 



with some constant C > 0. This proves that, for the optimal bandwidth, the 
bias term dominates the variance term whenever r < s. (Strictly speaking, here 
we consider upper bounds on the bias and variance terms and not precisely these 
terms.) 

Similarly, for the L 2 -risk we get 



WL-fWt = \\E f f n -f\\l + Var f>2 f n 



2a\ 1 K^- 1 f2(3 



and the minimizer h n = /i n (L 2 ) of the last expression is a solution in h of the 
equation 

(r + 2 7 - 1) log h + ^ + ^ = log n + C(l + o(l)). (12) 
Now, this equation implies 

ex P \-T7nT^) = C(l + o(l))^ ^exp' 



Tl ^ 



hL(Lo) J n 1 \ hiil- 



for some constant C > 0. This proves that also for the L2-risk the bias term 
dominates the variance term whenever r < s. 

Thus we obtain two different equations (fTTj) and (|12[) that define optimal band- 
widths for pointwise and L 2 risks respectively, and in both cases the bias terms are 
asymptotically dominating. 

In fact, we can obtain the same results using a single bandwidth defined as 
follows. Denote by = h*(n) the unique solution of the equation 

TT + 77 = log n- (loglogn) 2 , (13) 

(in what follows we will assume w.l.o.g. that n > 3 to ensure that log n > (log log n) 2 ) . 
Lemma |H1 in the Appendix implies that, both for the pointwise and the L 2 loss, the 
bias terms of the estimator f n with bandwidth h* given by (|13p are of the same or- 
der as those corresponding to bandwidths h n and /i n (L 2 ), while the variance terms 
corresponding to (|13|) are asymptotically smaller. Thus, the pointwise risk and the 
L 2 risk of the estimator /„ with bandwidth h* given by (fT3*j) are asymptotically of 
the same order as those for estimators /„ with optimal bandwidths h n and h n (L 2 ) 
respectively. 
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Note that, in fact, h* is better than both bandwidths h n and h n (h 2 ) in the 
variance terms, but these terms are asymptotically negligible w.r.t. the bias ones 
(cf. Lemma|Hl). Therefore, the improvement does not appear in the main term of the 
asymptotics. Note also that the sequence (log log n) 2 in (jT3j) can be replaced by a 
sequence satisfying b n = o((logn) 1_r / s ), o n /loglogn — > oo and the above argument 
remains valid (cf. the proof of Lemma |HJ). 

Calculating the upper bounds for bias terms of the estimator f n with band- 
width (|TB|) we get the following asymptotical upper bounds for its pointwise and L 2 
risks respectively: 

* = ^ exp b£j = wbrj exp Vk ) (1 + 0(1)) (14) 

and 

^(L 2 )=Lexp(-^. (15) 

The above remarks can be summarized as follows. 

Theorem 1 Let a > 0, L > 0,0 < r < s < oo, let the left inequality in (QJ) hold and 
<3> e (w) ^ 0,V « e 1. Then the kernel estimator f n with bandwidth defined by 
satisfies the following pointwise and h 2 -risk bounds 

limSUpSUpi? n (x, f n , A at r(L))ip~ 2 < 1, (16) 

n— »oc xeK 



limsu Pj R n (L 2 ,/ n ,^ Qir (L))^ 2 (L 2 ) < 1, (17) 

n— >oo 

where the rates <p n and <£> n (L 2 ) are awen in (f7^[ ) and < f73)) . 

The case r = 1 and s = 2 is of a particular interest. It covers the situation where 
the noise density f £ is gaussian (s = 2) and the underlying density / admits the 
analytic continuation into a strip of the complex plane (r = 1), as it is the case for 
the statistically famous densities mentioned in the introduction. This case is in the 
zone r < s/2, where we get the following behaviour 

2 f ^(^) (1 " rVS ^p(-2«(^) r/S )(l + o(l)), ifr< a /2, 
i^(^) exp (-2a v /^+^) (1 + 0(1)), if r = a/2 
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and 




(19) 



The bandwidth (|13|) depends on the parameters a, r of the class A a , r (L) that are 
not known in practice. However, it is possible to construct an adaptive estimator 
that does not depend on these parameters and that attains the same asymptotic 
behavior as in Theorem 1 both for pointwise and L 2 risks when r < s/2. Define the 
set of parameters 



Note that the parameters s and (3 are supposed to be known since they characterize 
the known density of noise f e . 

Theorem 2 Suppose that the left inequality in (QJ) holds and $ e (-u) /fl,Vu6l. 
Let /* be kernel estimator defined in (@) with bandwidth h n = h* defined by 



= {(a, L, r) : a > 0, L > 0, < r < s/2} . 




(20) 



forn large enough so that logn/(2/3) > 1. Then, for all (a,L,r) G 



limsupsupi? n (x, /*, A a , r (L))ip n 2 < 1, 



and 



limsupi? n (L 2 ,^,^l Qir (L))^ 2 (L 2 ) < 1, 



ra— >oo 



where the rates ip n and (p n (h 2 ) ar & given in and (\15\ (and, more particularly, 
satisfy J23) and (JT^) with r < s/2). 



Proof. Since r/s < 1/2, we have - - y^ffY " > -^y^ff for n lar § e 



enough, and thus 




On the other hand, 




12 




Therefore, the ratio of the bias term of f* to the variance term of /* both for the 
pointwise risk and for the L 2 -risk is bounded from below by 



(logn) exp j3 



for some b e R. This expression tends to oo as n — > oo. Thus, the variance terms 
are asymptotically negligible w.r.t. the bias terms. It remains to check that the bias 
terms of /* for both risks are asymptotically bounded by ^\ and (p^ (L 2 ) respectively. 

In view of Proposition Q for n large enough the bias term of /* for the pointwise 
risk is bounded from above by 




r/s-l/2 N 



2vrar V 2/3 
= ^(l + o(l)), 

where c > is a constant and we have used (|18|) with r < s/2 for the last equality. 
Similarly, for n large enough the bias term of /* for the L2-risk is bounded from 
above by 



Lexp 



-2a 



logn 



r/s 



, . ' log n \ r l 
< Lexp -2a ( ) +c 




^(L 2 )(l + o(l)), 



2(3 J \ 2(3 

where c > and we have used ()19|) with r < s/2 for the last equality. □ 
If r = s/2, adaptation to (a, L) is still possible via a procedure similar to that of 
Theorem but it does not attain the exact constant, as shows the following result. 
Introduce the set 

©o = {(a, L) : < a < a , L > 0}, 
where ao > is a constant. 

Theorem 3 Suppose that the left inequality in (QJ) holds and $ £ (m) /0,Vm6 1. 
Let f* be the kernel estimator defined in (0) with bandwidth h n = h* defined by 

-l/B 

a . logn _ A / logn 

' 2(3 PV 2(3 
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where A > a and n is large enough so that log n/ '(2/3) > (A//5) 2 . Then for r = s/2 
and for all (a, L) e 8 , 



aA 


a 2 


T~ 




aA 


a 2 







where the rates tp n and <^ n (L 2 ) are given in and (jiffi. 
Proof. It is easily checked that the bias exponent 



i 2a \ / n /logn aA . 

ra ' ! "(WJ =exp V^ + ~ l( +o()) ' 



while for the variance term exponent 



(21) 
(22) 



n v (^)v V ^ 

Since A > a, the bias term of /* asymptotically dominates its variance term. In- 
equalities (|21j) and (|22jl now follow from these remarks and the expressions for ip^, 
^(L 2 ) in (dU) and (0 with r = s/2. □ 



4 Minimax lower bounds, sharp optimality and 
superefficiency 

In this section we establish lower bounds for the risks showing that, under mild ad- 
ditional assumptions, the upper bounds of the previous section cannot be improved 
(in a minimax sense on the class of densities A a , r {L)) not only among kernel esti- 
mators, but also among all estimators. In other words, the estimators suggested in 
the previous section attain optimal rates of convergence on A a ^ r (L) with optimal 
exact constants. 

We suppose that the following assumption holds. 

Assumption (ND). There exist constants U\ > 0, B > and 71 G K. such that 
$ e (u) is twice continuously differentiable for \u\ > ui with the derivatives satisfying 

max{|($ 6 (it))'|, \{<S> £ {u))"\} < exp{-(3\u\ s ), 
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where (3 > and s > are the same as in Assumption (N). 

Note that this assumption is satisfied for the examples of popular noise densities 
mentioned in the Introduction. 

Theorem 4 Let a>0,L>0,0<r<s<2, and suppose that Assumption (ND) 
and the right hand inequality in (QP hold. Then 

liminf inf R n (x,T n , A a , r {L))p- 2 > 1, V x G R, (23) 

and 

liminf inf Rn{U,T n , ^ a , r (L))^ 2 (L 2 ) > 1, (24) 

where inf^ denotes the infimum over all estimators and the rates (p n , <£> n (L 2 ) are 
defined in fX^ and J73J). 

Proof of Theorem 4 is given in Section 5. 

Theorems 1,2 and 4 immediately imply the following result on sharp asymptotic 
minimaxity of the estimators constructed in Section 3. 

Theorem 5 Let a > 0, L > 0, < r < s < 2, let Assumptions (N), (ND) hold and 
$ e (n) ^0,VmgK. Then the kernel estimator f n with bandwidth defined by (Q3J) 
(or with bandwidth defined by VOty if r < s/2) is sharp asymptotically minimax on 
A at r{L) both in pointwise and in L 2 sense: 

\imR n (xJ n ,A a ,r(L))ip~ 2 =\imMR n (x,T n ,A a ,r(L))(p~ 2 = l, ViGl, (25) 

n^oo n— >oo T n 



lim R n (h 2 J n ,A a>r (L))p- 2 (h 2 ) = lim inf i? n (L 2 , T n , ^ Q , r (L))^ 2 (L 2 ) = 1. (26) 

This is the main result of the paper. It shows that the kernel estimator f n with a 
properly chosen bandwidth h n is sharp optimal in asymptotically minimax sense on 
A a , r {L) and that for r < s/2 the estimator is sharp adaptive in asymptotically 
minimax sense on A a ^ r (L). Sharp adaptation is thus obtained by direct tuning of 
the smoothing parameter without any additional adaptation rule. This is one of the 
effects of dominating bias. Theorem El also provides exact asymptotical expressions 
for minimax risks on A a , r {L) under the pointwise and the L 2 losses: it states that 
they are equal to ip^ and (/? 2 t (L 2 ) respectively. 
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Thus, (p n and ip n {L 2 ) can be chosen as reference values to determine efficiency of 
estimators. An interesting question is whether there exist superefficient estimators 
f n , i.e. such that 



sup Ef 



\f n (x)-f(x)\* =<>(<£) and E f \\f n - f\\j = o(^(L 2 )), (27) 



as n — > oo, for any fixed / e A a , r {L). The answer to this question is positive, as 
shows the next proposition. 

Proposition 3 Let the conditions of Theorem^ hold. Let f n be the kernel estimator 
f n with bandwidth defined by (Q3J) (or by \20\) if r < s/2). Then f n satisfies \21\j . 
If, moreover, the conditions of Theorem^ hold, f n is superefficient in the sense that 

wya , te6Ri (28) 

»woc mf rn sup feAar(L) E f [\T n (x) - f\x)\ 2 \ 

l iTn ^/[ll/n-Zllj] _ n (?q] 

lim M Tn su PfeAaAL) E f [\\T n -m} U - [ZJ) 



n— >c» 



Proof. Consider the kernel estimator / n with bandwidth defined by f)13|) . In- 
stead of using Proposition flj to bound the bias term, we apply directly for the 
pointwise risk and (0) for the L 2 -risk which yields that, for any fixed / G A a ^ r (L), 

sup \E f f n (x) - f(x)\ 2 = o (/i^ 1 exp(-2a//i r J) = o(^), 



\\Effn-m = o(exp(-2«A:)) = o(^(L 2 )), 



as n — > oo. Now, Proposition El and ()68|) of Lemma |H1 imply that the variance 
terms are also o(cp%) and o(^(L 2 )), as n — > oo, respectively. Hence, (J2ZJ) follows 
and implies (|2*H|) and (|2l?j) , in view of Theorem El The case where the bandwidth is 
defined by ()20j) and r < s/2 is treated similarly. □ 

The result of Proposition El is explained by the fact that the value of the minimax 
risk in the denominator of ()29j) is attained (up to a 1 + o(l) factor) on the densities 
that depend on n, while in the numerator we have a fixed density /. Such a superef- 
ficiency property occurs in other nonparametric problems (see e.g. Brown, Low and 
Zhao (1997) or Tsybakov (2004), Chapter 3), where it is proved for various adaptive 
estimators. On the contrary, non-adaptive asymptotically minimax estimators, for 
example, the Pinsker estimator which is efficient for ellipsoids in gaussian sequence 
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model, are not superefficient and turn out to be inadmissible (Tsybakov (2004), Sec- 
tion 3.8). Compared with that, the result of Proposition 01 is somewhat surprising, 
because it states that a non-adaptive asymptotically minimax estimator /„ with 
bandwidth defined by (|13|) is superefficient. This provides a simple counter-example 
of a superefficient nonparametric estimator which is not adaptive. We conjecture 
that this is a general property of nonparametric problems with dominating bias. 

5 Proof of Theorem SI 

5.1 General scheme of the proof 

We use the method of proving lower bounds by reduction to the problem of testing 
two simple hypotheses (cf. e.g. Tsybakov (2004), Chapter 2). Namely, we define 
two properly chosen probability densities f n \ and / n2 , depending on n and belonging 
to A a)r (L) and we bound the minimax risk as follows 

inf R n (T n , A atr )i)~ 2 > inf max E f d 2 (T n , f)ip~ 2 

T n T n fe{f„l,f„2} 

> inf max (E f d(T n , f)f (30) 

where R n (T n ,A a , r (L)) is either Rn(x, T n , A a , r (L)) or R n (L 2 , T n , A a , r (L)), tp n is de- 
fined as ip n or <y9 n (L 2 ) (cf. (fT4^) and ([To]) ) respectively and d(T n , f) stands for the 
distance \T n (x) — f(x)\ at a fixed point x or the L 2 -distance \\T n — f\\ 2 respectively. 
Hence, to prove the theorem it remains to show that 

i? d =inf max Efd(T n , f) > i/j n (l + o(l)), (31) 

T n /£{/nl,/n2} 

as n — > oo, for both pointwise and L 2 distances d(-, ■). This will be done by appli- 
cation of Lemma |U of the Appendix. According to Lemma H (J3*T|) is satisfied if the 
functions f n \ and / n2 are chosen such that 

difnijnz) > 2^ n (l + o(l)), asn^oo, (32) 
X 2 (Pf n i,Pfn2) = o(l), as n > oo, (33) 

where x 2 (Py nl , P/ n2 ) is the x 2 -divergence between the probability measures P/ nl and 
Pf n2 (recall that Pf denotes the joint distribution of Yi, ... ,Y n when the underlying 
probability density of Aj's is /). Thus, to prove Theorem 0] it suffices to construct 
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two functions j n \ and f n2 belonging to A a , r (L) and satisfying ([32)1 — (|3*3*)l. Since P/ n 
is a product of n identical probability measures corresponding to the density /, y 



n. 



f nj * for j = 1, 2, we have X 2 (P /nl , P /n2 ) < Cn X 2 (/£, / n r 2 ) if rUi, /£) < V 
where C is a finite constant and 

xv^a = [ if ^: Y fL) \ x)dx 

J Jnl 

(cf. e.g. Tsybakov (2004), p. 72). Therefore, (jHSj) follows from 

We now proceed to the construction of densities / n i, / n2 G «A ajf .(L) satisfying 
and (|3*2)l for pointwise and L 2 -distances d(-,-). 

Consider a density /o of a symmetric stable law whose characteristic function is 



$n fit) 



exp (— |c m| ) , if 1 < r < 2 
exp (— \cqu\) , if < r < 1 



+ — logn + (log log n) 2 . (36) 



where Co > maxja 1 ^, a}. Clearly, for any < a < 1 there exists Co > large 
enough so that f G A air (a 2 L). In view of Lemma [7[ there exists d x > such that 

ftw= M3 s N-^ +1 ' (35) 

for all x£l, where p is the density of stable symmetric distribution with character- 
istic function exp(— |t| max 'i r ' 1 J'), < r < 2. Let h + = h + {n) be the unique solution 
of the equation 

h\ ' h% 

Note that h + is analogous to h* defined by (J13p with the only difference that the 
(log logn) 2 term changes the sign. 

We define the densities f n \ and / n2 by their characteristic functions 

$ nl (u) = $ (w) + $ H (u, h+) , $„ 2 (u) = $ (u) - <$> H (u, h+) , mgM, (37) 

where u \— > § H (u,h) with /i > will be called perturbation function and will be 
defined differently for the pointwise distance and the L 2 -distance. The construction 
of perturbation functions will be based on the following lemma. 
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Lemma 1 For any 5 > and any D > 45 there exists a function $ G : R — > [0, 1] 
such that 

(i) $ G is 3 times continuously differentiable on R and the first 3 derivatives of 
$ G are uniformly bounded on R, 

(ii) $ G is compactly supported on (5,D — 5) and 

I(25<u<D-25)<$ G (u)<I(5<u<D-5), 
for all u G R. 

Proof of Lemma [2 Denote by Jo the 5-fold convolution of the indicator function 
I{\ u \ < 1) with itself. Let J : R — > [0, oo) be a rescaling of Jo such that the 
support of J is (—1, 1) and f J(x)dx = 1. Then J and J are 3 times continuously 
differentiable on R. For 5 > and /J > 45 define 

$ G (w) = / -J -U. 

Clearly, $ G is 3 times continuously differentiable on R and < $ G (w) < 1, Vm 6 i. 
Moreover, supp $ G = (5,D — 5) and for any u G (25, D — 25) we have $ (it) = 
J(x)ob = 1. □ 

5.2 Lower bound at a fixed point 

Without loss of generality, we will prove the lower bound for the distance d(f, g) = 
|/(0) — (7(0)| at the point x = (if x 7^ it suffices to shift the functions f n \ and 
f n2 at x). Define the perturbation function 

$ H (u, h) = V2^rT /i (1 ~ r)/2 exp exp (-2a \u\ r ) $ G (\u\ r - , (38) 

where $ G is a function satisfying the properties given in Lemma for some 5 > 
and L> > 45. 

Most of the computations below work when $ G is replaced by an indicator func- 
tion of the interval [0, D\. However, we obviously need a continuous perturbation 
function <& H that satisfies $^(0) = to ensure that / nl and / n2 integrate to 1 and 
that is smooth enough to allow an appropriate bound on the x 2 -divergence. 
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Lemma 2 Let f n \ and f n 2 be the functions defined by their Fourier transforms 
QffTD , (jj&j) with $ G satisfying the properties given in LemmaU\ Then we have the 
following. 

1. The functions f n \ and f n 2 are probability densities for any n large enough. 

2. The functions f n \ and f n 2 belong to A a ^ r (L) for n large enough if cq > in 
the definition of fo large enough. 

3. The distance between / nl and f n 2 at x = satisfies 

\fni (0) - U (0)| > 2vnie~ 4a8 - e- 2 ^ D - 2S \l + o(l)), 

as n — > oo. 

4- The x 2 -divergence \ 2 (fniJL) satisfies <\3J). 

Proof. 1. Clearly, $> H (-,h) is an even, 3 times continuously differentiable function 
on R having a compact support. It is easy to see that the integrals J \& H (u, h)\du 
and J \d 3 Q H (u, h)/du 3 \du are bounded uniformly over < h < ho for any ho > 0. 
Integration by parts yields that the inverse Fourier transform of § H (-,h) can be 
written as 

tji u\ dcf 1 f i n, 1 f ■ t ,d 3 § H (u,h) 

H(x } h) = — / cosixu)® (u, n)du = / sinlra — — du (39) 

2n J 2nx i J ou A 

for all x G M and < h < ho. Thus, there exists a constant Ch < oo independent 
of n and such that 

\H(x, h+)\ < C H (\x\ 3 + I)" 1 , for all iGl (40) 
Denote by Dom the common support of the functions $ G (|w| r — l/h r + ) and § H (u, h + ): 
deff 1P 1 rp „ ,1 f ( r . lY /r P . 1 NV 



Dom=l [ u:\u\ r - — e[6,D-5]<j = ^u:^+—) < \u\ < [D - 5 + — 

Using the fact that (5 + l/h r + ) /r — > oo, as n — > oo, for any fixed 5 > and applying 
(163(1 of Lemma El in the Appendix, we find 

dcf 



\H(;h 



sup|#(x,/i+)| <^-[ \$ H {u,h + )\du 



< y— — /i+ exp (a/h^j f exp(—2a\u\ r )du 

V 27T iflom 

< c/t+ exp(— a/h r , ) — o(l), as n — ► oo, (41) 
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where c > is a finite constant. 

Now, f n i{x) = f (x) + H(x, h + ), f n2 (x) = f (x)-H(x, h + ). Choose A > large 
enough so that for \x\ > Awe have C^(|x| 3 + l) _1 < c' 1 (|a:| max { r,+1 ' 2 J' + l)~ 1 (note that 
max{r + 1, 2} < 3). Then, in view of (JHSJ) and (@UJ), f nj (x) > 0, j = 1, 2, for \x\ > A. 
Now, if n is large enough, f n j(x) > also for \x\ < A since inf \ x \<a fo( x ) > (cf. 
dSH) ) and (jUJ) holds. 

Thus, f n j{x) > 0, j = 1,2, for all x G M if n is large enough. It remains to 
note that f n i and f n2 integrate to 1 since f H(x, h + )dx = § H (0,h + ) = (indeed, 
^ supp $ H (-, h + ) = Dom). 

2. We have, by (J3~%)) and Lemma [U 

(u, h + ) \ 2 exp (2a \u\ r ) du 

(- 



< 2-KOirLh l + r exp ( — ) / exp (— 2a \u\ r ) du 



Dom 



(2a \ f°° 
— I / exp(—2au r )du. 
"+/ J(5+i/h r + y/ r 

By Lemma El 

/ exp(-2cm r )du = -^exp -— exp(-2a«5)(l + ^ + )( 1 -^ r (l + o(l)), 

J(s+i/hr)i/r 2ar V KJ 



'{8+i/hr + )V 
as n — > oo. We get therefore, 



y |$ ff (x,/i+)| 2 exp(2a|Mnc/M < 2vrLexp(-2a5)(l + o(l)), (42) 

as n — > oo, for any fixed 5 > 0. Now, choose Co > in the definition of fo large 
enough to guarantee that / G -4. Qjr .(a 2 L) with a = 1 — e~ a<5 / 2 . This and (J4*2~j) imply 

(J \$ nj (u)\ 2 ex P (2a\u\ r )du^ < \\%(-) exp(a| • \ r )\\ 2 + \\<f> H (-, h+) exp(e*| • \ r )\\ 2 

< (l-e- a5/2 )V2^L + e- a5 V2^L{l + o(l)) 



< V2^L, j = 1,2, 
for n large enough and any fixed 5 > 0. 
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3. Using the left inequality in (n) of LemmaE]we get 



|/m (0)-/„a(0)| 5 



1 



(2tt) 

2arLh 



> 



71 

larLh 1 -' 



7T 



($nl 


(«)- 


r 


/2a 


- exp 




r 


/2a 


- exp 





(2tt)' 



exp(-2a|Mn$ G ( |< - — ] du 



$ (u, h + ) du 
1 



(D-25+l/h!, ) 1/r 



(25+1/W 



exp (— 2au r ) du 



.(43) 



By (j63j) of Lemma El in the Appendix, 

■>(D-2&+l/h r + ) 1 / r 



{2&+l/h r + y/ r 

K r r l ( 2a 



exp (— 2au T ) du 



2ar P \ h r 



[(i + 2^;) (1 - r)/r e - 4a5 (i + o(i)) 



-(1 + (D - 25)h r + Y 1 ^/ r e- 2a{D - 25 \l + o(l))] 



2a7 exp \-K ][e 



(44) 



as n — > oo. The expression in square brackets here is positive since -D > 45. 
Combining ()43|) and (|44jl and using (|77|) of Lemma El in the Appendix together with 
flUJ) we get 



|/m(0)-/ n2 



> 4 



2vrar + 6XP I h r . 



L h r-l 

2nar * \ hi 



-AaS 



-4«(5 



e - 2a (D-2 5)] 2 (1 + o(1)) 



-2a(£>-2<5)]2, 



l + o(l)) 



4^[e-^_e- 2 ^-2 5)] 2 (1 + o(1))) 



as n — > oo. 

4. Inequalities (J35f . (J4T)|) . (}4T| and the fact that r < 2 imply the existence of a 
constant c' 2 > independent of n and such that 



fnl{x) > 



U|max{r+1,2} _|_ Vx G ^' 



for all n large enough. Since f e is a probability density, we have f_ M f £ (x)dx > 1/2 
for a constant M > 1 large enough. Hence, 



r M r' 
/£(*) > / fni(x-y)f e (y)dy>-± inf 

2 |y|<J\ 



, , 1 1 
^> p TTiin \ 

3 ' jVfmax{r+l,2} ' U|max{r+1,2} 



1 



1 2; y |max{r+l,2} _|_ J 



(45) 
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where n and M are large enough, c' 3 > is independent of n, and the last inequality 
is obtained by considering separately \x\ < M and \x\ > M. Thus 

nxVLfa) = n [ {f ^:Jp* {x) dx = 4n [ {H */ ( ] } x) dx 

J Jnl\ X ) J Jnl\ X ) 



< 4 ( nM max{r+1 ' 2} f (H*f E ) 2 (x)dx 

C 3 V J\x\<M 



\x\<M 

+n [ |x| max{r+1 ' 2} (iJ* ff{x)dx 

J\x\>M / 

< (4M 3 /c 3 )(T nl + T n2 ), (46) 

for n and M large enough, where H(x) = H(x, h + ) for brevity and 

T nl = n\\H * fWl T n2 = n J \x\\H * f) 2 (x)dx. (47) 

Using Plancherel's formula and the right hand inequality in (Q) we get, for n large 
enough, 

\\H*n\ = \® H {u,Kwn 2 du 

< b 2 nmx arLh l ~ T exp ( ^ J f \u\ 21 ' exp(-4a\u\ r - 2j3\u\ s )du 

V^+y J Dora 

< 2b 2 m&x arLh 1 - r exp(^f] [ u 2 ^'exp(-4au r - 2f3u s )du 

\h+J J(8+i/h r + y/r 

(n \ poo 
— —J / u 2l 'exp(-2pu s )du. (48) 
h +J Ji/h+ 

The last integral is evaluated using (jfiHJ) of Lemma El in the Appendix: 

roc ,s-2Y-l / ya\ 

/ u^'exp(-2(3u s )du = exp ( (1 + o(l)), (49) 

as n — > oo. This, together with (|48jl and (|78jl of Lemma El in the Appendix, yields 



||ff. r |g< C ^„p(-|-g)=.(Ij, (50) 

as n — * oo, where C > is a constant. Thus, 

Tni = o(l), as n — > oo. (51) 

Now, assume that n is large enough to have (5 + \/h r + ) l l r > max(ii , Ui), where 
Mo > 0, u\ > are the constants in Assumptions (N) and (ND). Then $ G (|w| r — 
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1/^+ ) — f° r \ u \ < max(iio, Mi), and thus the function $ H (-, /i + )$ £ (-) is twice 
continuously differentiable on R. Using Assumption (ND), the right hand inequality 
in (0) and the fact that $ G , together with its first two derivatives, is uniformly 
bounded on R we find that there exist constants B\ < oo and a e R such that, for 
n large enough and all u G R, 

/i+)$ 6 («))"| < Si/iJ~ r)/2 exp (jyj \u\ a ex P (-2a\u\ r - (3\u\ s ). (52) 

Thus, for n large enough, we have, by Plancherel's formula for derivatives and (|52p. 

T„ 2 = £ / |($ £r (u,/i + )$ e («)ri 2 d« 

27T 7 

< —Bjh\- r exp(^-)[ \u\ 2a exp(-Aa\u\ r - 2(3\u\ s )du 

< -Blh^exp J / w 2a exp(-4cra r -2/3u s )du 
t V^+y J(8+i/hi)V>- 

< ^2^1-r exp /_^ /" u 2a eXP (-2/5« S )d«. (53) 
^ \ h+J Jl/h+ 

Plugging P^j) with 7' = a into and using (J78|) of Lemma 01 in the Appendix we 

get 

T n2 < C^; 2a+ —exp f-^ - ^) (1 + o(l)) = o(l), (54) 

as n — > 00, where C > is a constant. 

Combining (joT|) and (pU) we get that nx 2 (fni, fn2) ~~ * 0> as n ~~ * 00 • D 
Proof of f)23|) . We use the general scheme of Section I5~T1 with d(f n i, f^) = 

|/ni(0) — /n2(0)|- Choose c > in the definition of f large enough to guarantee 

that assertion 2 of Lemma 121 holds. Lemma 121 implies that (J34)) and thus (|33|) are 

satisfied and that (l3~2l holds with 



*P n = Vn [e- 4aS - e- 2a{D - 25) ]. 
Therefore, Lemma 0] of the Appendix implies that 

R>Me-' a5 -e- 2a{D ~ 25) ](l + o(l)), 
as n — > 00, where i? is defined in (|31j). This and (|3Ti|) yield that, as n — > 00, 
inf i?„(0,T n ,^ a , r (L))^ 2 > [e~ 4a5 - e" 2 ^ 25 ^! + o(l)). 



Taking limits as n — > 00 and then as D — > 00 and 5 — > we get ()23|) for a; = 0. The 
proof for x 7^ is analogous (see the remark at the beginning of this section). □ 
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5.3 Lower bound in L 2 

Introduce the perturbation function 

<f> H (u, h) = ^/2narL(d - 1) h^^e^ 01 ^ exp (-ad\u\ r ) $ G Mu| r - , (55) 

where $ G is a function satisfying the properties given in Lemma Hand d = d(5) > 1 
is a constant depending on the value 5 that appears in the construction of $ G . 
The argument below is similar to that of Section 15.21 modulo the choice of the 
perturbation function (J53j) which is slightly different from (|38|). The argument goes 
through with d such that d{8) — > oo and 5d(5) — > as 5 — ► 0, but we will set for 
simplicity d(8) = o" -1 / 2 and assume that < 5 < 1, which ensures that d(5) > 1. 



Lemma 3 Let f n \ and f n 2 be the functions defined by their Fourier transforms flff7|) ; 



()5<5j) $ G satisfying the properties of Lemma Q] and < 5 < 1 . T/ien we have 
the following. 

1. The functions f n \ and f n 2 are probability densities for n large enough. 

2. The functions f n i and f n i belong to A a<r (L) for n large enough if Cq > in 
the definition of fo large enough. 

3. The L 2 distance between f nX and f n2 satisfies 

\\fm ~ Uh > 2^n(L 2 ) ((1 - V~5)[e- 4 ^ 5 - e-MD^syVs^ 2 (1 + o(1))> 
as n oo. 



4- The x 2 -divergence x 2 {fniJn2) satisfies ^J). 



Proof. 1. The argument is analogous to the proof of assertion 1 of Lemma |21 In 
particular, one also has \H(x, h)\ < C^(|x| 3 + l) _1 , Va; G R, and \\H(-, /i+)||oo = o(l), 
as n —>■ oo, for some constant C' H < oo. We omit the details. 
2. We have by (pITj) and Lemma ^ 

J \& H (u, h + ) | 2 exp {2a\u\ r )du 
< 2-KarL(d - l)h l + r exp ( ~ J f exp (-2a(d - 1) |u| r ) du 



OC 



< [7T(i; L((/- l)//','' oxp ( 2 ^ fer ^ a ) / exp ( — 2(i ( d — 1) //' ) du. 



(s+i/h r y/r 
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By Lemma El 



poo 

/ exp (— 2a(d — l)u r ) du 



as n — > oo. We get therefore, 

|$ H (n,/i + )| 2 exp(2a|ur)dn < 2nL exp(-2ct(d - 1)5)(1 + o(l)), 



as n — > oo, for any fixed 5 > 0. Now, since ci = we get that the last exponent 

is strictly less than 1 for < 8 < 1, and thus the argument similar to that after 
formula (J42j) can be applied to show that 

J |$ ni (w)| 2 exp(2a|w| r )dw < 2vrL, j = 1,2, 

for n large enough, if Co > in the definition of /o is chosen large enough. 
3. The L 2 distance is 

||/nl -fra\\l= ^ H- $ «2 (m)) 2 ^ = ^- (u,fc+)| 2 ^ 

= 4Lc*r(d- l)/i^ r exp ^ d '~ ^ J exp(-2ad\u\ r ) <5> G (\u\ r - -^J 



2 



> 4Lar(d - l)/ij_- p exp 1 ; 2 / exp (-2adw r ) 



(56) 



r*(D-2<5+l/h! l _) 1/r 
{25+l/h r + ) 1 / r 

where we used the left inequality in (ii) of Lemma 121 Lemma El implies that (cf. 
©): 

-(D-2S+l/h r + )V 



/ exp (—2adu r ) du 

J (2S+l/h r + ) 1 / r 

e X p [e" 4ad5 - e - 2Qd ( D - 25 )](l + o(l)), 



2ac?r 

as n — ► oo. Substituting this into (jHEj) and using (J77|) of Lemma El we obtain 
|| /nl - /n2 || 2 > 4L^exp(-^[e- 4 ^-e- 2 ^- 25 )](l + o(l)) 

= 4Lexpf-|^ (l-v / 5)[e- 4a ^-e- 2Q ^ 25 ^](l+o(l)) 
= 4<^ 2 (L 2 )(1 - ^[e- 4 ^ - e -M£-25)M5 ](1 + o(1)); 
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as n — > oo, (cf. the definition of ip n (JL 2 ) in ()15j)). 

4. Similarly to the proof of assertion 4 of Lemma El we obtain 

V(/n y i, fL) < Ca(TuI + T n2 ), (57) 

for n and M large enough, where T nl and T n2 are defined in (|4Tj) and c' 4 < oo is 
a constant. The only difference from the proof of Lemma 121 is that the function 
H{x) = H(x, h + ) is now defined as the inverse Fourier transform of (J38|) and not as 
that of J37J). As in (J1EJ) — (|5L)p. we get, for n large enough, 

T nl =n||#*,f|| 2 

< b 2 m ^arL{d - l)n/^T r exp ( 2 ^ ~ ^ ) / \u\ 21 ' (~2ad\u\ r - 2f3\u\ s ) du 

\ W+ J J Dora 



< c'nh^exp (~ j / M 27 'exp (-2/3m s ) rfu 



< c/'n^-expf — -^) =o(l), (58) 



as n — > oo, where c' > and c" > are some finite constants. 

Next, similarly to (J52)) . we have, for n large enough and all u e 



/i + )$ £ (m)) ,, | < 5 2 ^/ 2 exp (^=^) 



u| a 'exp(-2arf|ur-/3|Mn, 



where B 2 < oo and a'el are some constants. This implies, as in (|53|) — (|54|) . that 

T n2 = ^- f \{<b H {u,h + W{u))"\ 2 du 
Zty J 

< -SX~ r exp f-^ ] r° u 2a 'exp(-2/3w s )ci M 

< cnfc; 2a ' + '-exp^-^-^ =o(l), (59) 

as n — > oo, where c > is finite constant. It remains now to combine (JH7j) — (159)) . 

Proof of (12 4|) is now obtained following the same lines as the proof of (|23)) in 
Section^ but with d(f nl , f n2 ) = \\f nl - f n2 \\ 2 and ^ n = y? n (L 2 ) ((1 - ^[e'^ - 

□ 



-2a 



i- \ 1/2 
(D-25)/v / 5l\ 7 



6 Appendix 

Let (Af, .4.) and (O, T) be measurable spaces and let Pi and P 2 be two probability 
measures on A. Let rf : (9 x 9,T(8T) -> (R + ,B) be a non-negative measurable 
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function where B is the Borel cx-algebra. Define 



R = inf max Ei[d(9,9i)}, 

§ »6{1.2} 

where inf^ denotes the infimum with respect to all the measurable mappings : 
(A", A) — > (0,T), Pj denotes the expectation with respect to Pi, and 9i, 9 2 are two 
elements of 0. 

Lemma 4 Suppose that: 

(i) d(-,-) satisfies the triangle inequality, 

(ii) 0i, 02 G are such that d(9\, 02) — 2-0, for some ip > 0, 

(Hi) P2 <C Pi and there exist constants r > and < 70 < 1 such that 



Pi 



dP 2 
dP 1 



> T 



> 1 - 7o- 



Then 



R > ^(1 — 7o) min{r, 1}. 

Furthermore, if instead of (Hi) we suppose that 

(iv) X 2 (Pii P2) < 7o; where < 70 < 1 and 

'dP 2 



X\PuP2) 



dP x 



-1 dPi, 



then 



i2>V(l-7o)(l- v^To). 
Proof. We first show (|5Uj). We have 



P > -inf £i[rf(0,0i)]+£ 2 [rf(0,0 2 )] 
2 ' 



> -inf £iK0,0i)]+r£i 
2 



minjr, 1} 
> ^^infEi 



dP 2 
dPi 



dPi ~ J 



>r) [d{9,9 1 ) + d{9,9 2 



Using here the triangle inequality and (ii) — (Hi), we find 

'dP 2 



R > Vmin{r, l}Pi 



dPi 



> r 



> -0(1 - 70) min{r, 1}. 



(60) 



(61) 



28 



To show (|61|) it is sufficent to note that, in view of Chebyshev's inequality 



Pi 



dP 2 
dPi 



> 1 



1-Pi 



dP 2 



- 1 < 



>i-i 

7o 



dP 2 
dP Y 



and thus (iv) implies (Hi) with r = 1 — y/jo. 
Lemma 5 For < a, r, L < oo, 



1 dPi>l-7o, 



□ 



(62) 



sup sup|/(x)| < L + 7r C(r, a), 
/eA»,r(B) zeR 



where C(r,a) = J °° exp(— 2au r )du. 

Proof. Let $ = $^ be the characteristic function of /. Clearly, 

|/(x)|<^ J \$(u)\du, WxeR. 
By Markov's inequality 

J \<$>{u)\ l(\®{u)\ exp (2a|M| r ) > < / exp (2a|w| r ) |$(u)| 2 du < 2ttL. 

Also, 

J |$(u)| /(|$(n)|exp(2a|M| r ) < l)du < 2^ exp (-2cm r ) dw = 2C(r, a). 
Combining the last two inequalities with (J62)) proves the Lemma. □ 
Lemma 6 For any positive a, (3, r, s and for any AeR and B 6 R ; we /iai>e 

/ u 71 exp (— «« r ) dw = — t> A+1 ~ r exp(— cra r )(l + o(l)), u — > oo, (63) 



and 



i> 1 



exp (Pu s ) du = — v B+1 ~ s exp((3v s )(l + o(l)),v ^ oo. (64) 
P s 



Proof of this lemma is omitted. It is based on integration by parts and standard 
evaluations of integrals. 

Lemma 7 Let p be the density of stable symmetric distribution with characteristic 
function exp(— \t\ r ), 1 < r < 2. Then p is continuous, p(x) > for all x G R and 
there exist c\ > 0, c 2 > such that 



p(x) > C\\x\ 



-r-1 



for \x\ > c 2 . 
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Proof. From Zolotarev (1986), Th. 2.2.3., formula (2.2.18), we get 



,.| X |V(r-i) r 1 



p ^ = ~o? TV u(p)exp(-\x\ r/{r 1] u(p))d(p, x ^ 0, (65) 



where 

'sin(7rr<^/2) V^ 1 ^ cos(7r(r — l)<p/2) 
cos(7r<^/2) J cos(ir(p/2) 
Clearly, for ip G [1/2, 1] we have 



u((p) 



1 > cos(7r(r - l)<p/2) > cos(vr(r - l)/4) > 
c 3 > sin(7rr<^/2) > c 4 > 0, 

where C3 > and C4 > are constants. Thus, 

C 6 (C0s(7r^/2)) 1/( ^ 1) < U(<p) < C 5 (cOs(7T^/2)) 1/(r - 1) , 

ip G [1/2, 1], c 5 > 0, c 6 > are constants. Now, if <p G [1/2, 1] 

C7(l — If) < COs(7T(^/2) < Cg(l — if) 

for some C7 > 0, Cs > 0. Finally, 

do(l - ^) 1/( ^ 1) < u(<p) < c 9 (l - ^) 1/(r - 1} , G [1/2, 1]. 
Using (jUB]) and the fact that u(ip) > for </? G [0, 1], we get 

p(z) > clxl 1 /^ 1 ) / (l-^) 1/(r - 1) exp(-|x| r/(r - 1) c 9 (l-^) 1/(r - 1) )^ 

Jl/2 

= c\x\ 1 '^ ! ll \y^^ v (-c^\ x \ r v ) i ^)d V . 



Here and further on c > are constants, probably different on different occasions. 
By change of variables, u = {\x\ r ip) l ^ r ~ l \ we get 



-(MVa) 1 ^- 1 ' u u r-2 

p(.v) > C\x\ 1/{T - 1] / |g|r/(r-l) exp(-C 9 ^)^ 7r ^ 



-(ixr/2) i /< r - i > 

c|x| _1_r / u T ~ l exp(— cgu)du 







»( c y2) 1 /('--i) 

> c|x| _1_r / M r_1 exp(— cgu)du > ci|a;| _1_r , 



for \x\ > C2 > 0. This also implies that p(x) > 0, Vx 7^ 0, and 

P(0) = (27r)- 1 | exp(-|tn*^0, 
hence p is positive on M. □ 
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Lemma 8 Let < r < s < oo and let h* = h*(n) be defined by (jiffi. 



2a 2/3 2 
— + — = log 7i - (log log raj . 



Let h n satisfy 



61og/i„ + ^ + ^ = logn + C(l + o(l)), n^oo, 



"n n 



for some b G R and Cel. Then, as n — > oo, we aave 

^(n) = (logn/(2/3))- 1 / s (l + o(l)) ) 



/or any a G M, and 



^ /2/?\ / / 2a \ 



/or n /aroe enough. 



Proof. Define = s , x n = h n s } and write, for t > 0, 

F(t) d = 2/3* + 2af /s , Fi (t) = f (-6/s) log t + 2/3t + 2af 

Then 

F(x*) = logn — (loglogn) 2 , 
F^) = logra + C(l + (l)), 

for a constant C G M. We first prove that x„ satisfies 

F(x n ) = logn + d loglogra(l + o(l)) + C 2 (l + o(l)) 

for some constants C\,Ci G M. In fact, 



= - (--) + w + —rt*- 1 > o, 
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for x n large enough, thus Fi(t) is strictly monotone increasing for large t, and a 
solution x n of (fTTj) exists for large n (and is unique). Next, clearly, 

^-1, (-co, 
2(3t 

and therefore logn/(2/3x n ) — > 1, as n — > oo. Similarly, logn/(2/3a;*) — > 1, as n — ► oo, 
which yields Thus (— b/s)\ogx n = (— b/s) loglogn(l + o(l)), as n — > oo, and 

write F(x n ) = Fi(x n ) + (b/s) loga: n to get (fT2~j) in view of (J7TJ). We have 



F 1 (logn + a n ), x* = F l (\ogn-b n ) 



where a n = C\ loglogn(l + o(l)) + 62(1 + o(l)) = O(loglogn), b n = (loglogn) 2 and 
is the inverse of F(-). Hence, for some < r < 1 and for n large enough, 

x n = F~ 1 (\ogn + a n ) = x* + (F~ l (\ogn-b n ))'(a n + b n ) 

+1 {F-\\ogn - b n (l - r) + ra n ))" (a n + 6 n ) 2 . (73) 

The first and the second derivatives of F _1 are given by 

(F-\y)Y 1 1 



F>{F-\y)) 2{3+(2ar/s)(F-i(y)y/s 
-{2ar/s){r/s - l)(F- 1 (y)) r / s - 2 



(2/?+ (2ar/s)(F- 1 (y)y/ s ~ 1 ) 3 ' 
Hence 

(F--(lo g n- M y = 2/3+(2 J /s)ri/ ,. 1 = ^ + o(l), „^oo. (74) 

Next, it easy to show that there exists y > such that 

y/m < F~\y) < y/{2(3) (75) 
def 

for y > y. Considering n large enough so that y n = logn — b n (l — t) + a n r > V and 
using the above expression for {F~ x {y))" and fj75j) we get 

{F-\y n ))" = (F~\\ogn-b n (l-r)+ra n ))" = 0((logny/ s - 2 ), n - oo. 

This and (J73J), (JZIJ imply 



x, - x n = + o(l))(a n + 6 n ) + O ^ g n)2 :; /8 J = -§(1 + o(l)). (76) 
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Using this representation we obtain 



-xp 1-^. + ^)= ex P (-2a(x r J s - x r J s )) 



K K J 

= exp (-2ax r J s ([l + b n {2^)-\l + o{l))] r ' s - 1)) 
= exp (0(& n < /8_1 )) = l + o(l), 

since b n = (log log n) 2 , = (2/3) -1 log n (1 + o(l)) and r < s. This and the fact 
that (h n /K) a = {x*/x n ) a/s = 1 +o(l) imply (jHZj). Next, © follows directly from 
the definition of /i* and from To prove (joTJj) . note that, in view of (|75jl . 

^+2 7 -i ex P ( 7"7 - 7"7 J = (l + o(l))exp(2/3(x»-x n )) 
= (1 + o(l)) exp(-6„[l + o(l)]) < 1 

for n large enough. □ 

Lemma 9 Let < r < s < oo an<i Ze£ /i + = h + (n) be the solution of (|J6|). T/ien 
/i + (n) = (logn/(2/3))-^(l + (l)) ; 

Wexpf-^) =/£exp(-^)(l + o(l)), asn^oo (77) 



(logn)Wp(-^-^) = o(l), (7* 



K) * \ KJ 

and 

(]ci(r n} b n exn I — 
as n — > oo, for any a G M, 6 G 1. 

Proof is analogous to that of Lemma |H| □ 
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